A Kolmogorov-Smirnov Correlation-Based Filter for Microarray Data
نویسندگان
چکیده
A filter algorithm using F-measure has been used with feature redundancy removal based on the Kolmogorov-Smirnov (KS) test for rough equality of statistical distributions. As a result computationally efficient K-S CorrelationBased Selection algorithm has been developed and tested on three high-dimensional microarray datasets using four types of classifiers. Results are quite encouraging and several improvements are suggested.
منابع مشابه
Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter
An algorithm for filtering information based on the Kolmogorov-Smirnov correlation-based approach has been implemented and tested on feature selection. The only parameter of this algorithm is statistical confidence level that two distributions are identical. Empirical comparisons with 4 other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.
متن کاملImportant Features PCA for high dimensional clustering
We consider a clustering problem where we observe feature vectors Xi ∈ R, i = 1, 2, . . . , n, from K possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of p n, where classical clustering methods face challenges. We propose Important Features PCA (IF-PCA) as a new clustering procedure. In IFPCA, we select a ...
متن کاملInfluential Features Pca for High Dimensional Clustering
We consider a clustering problem where we observe feature vectors Xi ∈ R, i = 1, 2, . . . , n, from K possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of p n, where classical clustering methods face challenges. We propose Influential Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select...
متن کاملA permutation test motivated by microarray data analysis
We introduce a nonparametric test intended for large-scale simultaneous inference in situations where the utility of distribution-free tests is limited because of their discrete nature. Such situations are frequently dealt with in microarray analysis where the number of tests is much larger than the sample size. The proposed test statistic is based on a certain distance between the distribution...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کامل